add MultiPandasIndex helper class #7182

benbovy · 2022-10-18T09:42:58Z

Closes #xxxx
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

This PR adds a xarray.indexes.MultiPandasIndex helper class for building custom, meta-indexes that encapsulate multiple PandasIndex instances. Unlike PandasMultiIndex, the meta-index classes inheriting from this helper class may encapsulate loosely coupled (pandas) indexes, with coordinates of arbitrary dimensions (each coordinate must be 1-dimensional but an Xarray index may be created from coordinates with differing dimensions).

Early prototype in this notebook

TODO / TO FIX:

How to allow custom __init__ options in subclasses be passed to all the type(self)(new_indexes) calls inside the MultiPandasIndex "base" class? This could be done via **kwargs passed through... However, mypy will certainly complain (Liskov Substitution Principle).
Is MultiPandasIndex a good name for this helper class?

benbovy · 2022-10-18T09:46:21Z

xarray/indexes/multipandasindex.py

+
+        return type(self)(new_indexes)
+
+    def copy(self: T_MultiPandasIndex, deep: bool = True) -> T_MultiPandasIndex:


Needs to be updated following #7140.

keewis · 2022-10-18T18:46:05Z

I wonder if it is possible to create a generic MultiIndex? Something like

ds.set_xindex(
    ["a", "b"],
    MultiIndex([("a", PandasIndex), ("b", PandasIndex), (["a", "b"], BallTreeIndex)),
)

(but I'm sure we can find a better syntax... maybe create a hashable sequence that is not a tuple, since that is already taken? and change the list of tuples to a dict).

For that concept to work, the return value of MultiIndex would have to be a factory function / class, which would instantiate the actual index.

What do you think?

headtr1ck · 2022-10-18T19:59:10Z

xarray/indexes/multipandasindex.py

+
+        seen = set()
+        dup_dims = []
+        for d in dims:


Since dims is a dict, shouldn't the keys be unique? I don't think you will have repetitions here.

Ah yes, sure 🙂. But we should probably check for duplicate dimensions so a dict is not what we want for dims.

You could just do

dim_names = [idx.dim for idk in indexes] if len(dim_names) != len(set(dim_names)): raise...

headtr1ck · 2022-10-18T20:01:32Z

xarray/indexes/multipandasindex.py

+
+    __slots__ = ("indexes", "dims")
+
+    def __init__(self, indexes: Mapping[Hashable, PandasIndex]):


Always better to use Any for Mapping inputs, due to covariance.
Use Mapping[Any, PandasIndex]

headtr1ck · 2022-10-18T20:05:17Z

xarray/indexes/multipandasindex.py

+    def _get_unmatched_names(
+        self: T_MultiPandasIndex, other: T_MultiPandasIndex
+    ) -> set:
+        return set(self.indexes).symmetric_difference(other.indexes)


I always find the operator notation easier to read: set(self.indexes) ^ set(other.indexes)

headtr1ck · 2022-10-18T20:05:41Z

xarray/indexes/multipandasindex.py

+
+    def _get_unmatched_names(
+        self: T_MultiPandasIndex, other: T_MultiPandasIndex
+    ) -> set:


Use set[Hashable]

headtr1ck · 2022-10-18T20:07:36Z

xarray/indexes/multipandasindex.py

+
+        return type(self)(new_indexes)
+
+    def copy(self: T_MultiPandasIndex, deep: bool = True) -> T_MultiPandasIndex:


Might as well implement the correct deep-copy behavior which passes the memo dict.

headtr1ck · 2022-10-18T20:09:03Z

xarray/indexes/multipandasindex.py

+        else:
+            return type(self)(new_indexes)
+
+    def sel(self, labels: dict[Any, Any], **kwargs) -> IndexSelResult:


Use Mapping[Any, Any]

This method should be called internally in Xarray (always with a dict), but maybe there are some cases where one wants to call it directly so Mapping[Any, Any] would be better..

benbovy · 2022-10-18T21:40:49Z

I wonder if it is possible to create a generic MultiIndex?

Hmm that could be possible but it think there are just too many possible edge cases for something generic like that.

In your specific example

ds.set_xindex(
    ["a", "b"],
    MultiIndex([("a", PandasIndex), ("b", PandasIndex), (["a", "b"], BallTreeIndex)),
)

we could probably use the BallTreeIndex for point-wise indexing (i.e., with ds.sel(a=xr.DataArray(...), b=xr.DataArray(...))) and use the two PandasIndex instances for other kinds of selection (e.g., with slices, scalars, etc.) so there's no conflict, but I doubt this would be what we want in other cases.

I guess your suggestion is a way around the constraint in the Xarray data model that a coordinate cannot have multiple indexes? I'm afraid there's no easy solution that is generic enough. Maybe some cache to avoid rebuilding the indexes? I.e., .set_xindex() doesn't drop the pre-existing index(es) but rather disable them so that it is possible to re-enable them later with another .set_xindex() call (.xindexes only returns the "active" indexes but there may be other "inactive" indexes attached to a dataset).

add MultiPandasIndex

8157264

benbovy commented Oct 18, 2022

View reviewed changes

fix mypy errors

e4d753c

headtr1ck reviewed Oct 18, 2022

View reviewed changes

benbovy added the topic-indexing label Oct 19, 2022

benbovy mentioned this pull request Oct 25, 2022

Explicit indexes: next steps #6293

Open

49 tasks

benbovy mentioned this pull request Nov 21, 2022

Create a to/from xarray method/function sunpy/ndcube#222

Open

headtr1ck added the needs work label Nov 21, 2022

benbovy marked this pull request as draft August 23, 2023 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add MultiPandasIndex helper class #7182

add MultiPandasIndex helper class #7182

benbovy commented Oct 18, 2022

benbovy Oct 18, 2022

keewis commented Oct 18, 2022 •

edited

Loading

headtr1ck Oct 18, 2022

benbovy Oct 18, 2022

headtr1ck Oct 18, 2022 •

edited

Loading

headtr1ck Oct 18, 2022

headtr1ck Oct 18, 2022

headtr1ck Oct 18, 2022

headtr1ck Oct 18, 2022

headtr1ck Oct 18, 2022

benbovy Oct 18, 2022

benbovy commented Oct 18, 2022


		return type(self)(new_indexes)

		def copy(self: T_MultiPandasIndex, deep: bool = True) -> T_MultiPandasIndex:


		__slots__ = ("indexes", "dims")

		def __init__(self, indexes: Mapping[Hashable, PandasIndex]):

add MultiPandasIndex helper class #7182

Are you sure you want to change the base?

add MultiPandasIndex helper class #7182

Conversation

benbovy commented Oct 18, 2022

Choose a reason for hiding this comment

keewis commented Oct 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

headtr1ck Oct 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benbovy commented Oct 18, 2022

keewis commented Oct 18, 2022 •

edited

Loading

headtr1ck Oct 18, 2022 •

edited

Loading